Data and code to replicate the main results in "Algorithmic Selection and Supply of Political News on Facebook" by Erwan Dujeancourt and Marcel Garz in The Information Society.

The data set and scripts were compiled using Stata IC 14.2 for Windows (64-bit)---including the add-ons coefplot and estout---as well as R-4.0.3 for Windows (64-bit)---including the packages BTM, data.table, dplyr, haven, quanteda, stopwords, stringr, tidyr, tidyverse, and udpipe.

In the *root directory*, the main analysis do-file is "fb tw analyze.do", which uses the main dataset "fb tw micro data w vars.dta" and the auxiliary dataset "fb tw micro data placebo.dta".

The folder *intermediate files and code* includes the raw data and scripts used to compile the main dataset:

1) The file "fb tw micro data.dta" includes the tweets and posts as downloaded from Twitter and Facebook, respectively.

2) The file "followers twitter.xlsx" contains manually collected information on newspapers' (pre-algorithm) number of followers.

3) The file "monthly active users.xlsx" lists the global number of active Twitter users extracted from Twitter's annual reports.

4) The files "SentiWS_v2.0_Negative.txt" and "SentiWS_v2.0_Positive.txt" come from https://downloads.wortschatz-leipzig.de/etc/SentiWS/SentiWS_v2.0.zip and are used top identify negatively and positively connotated words in Step 5.

5) The script "analyze content.R" uses "fb tw micro data.dta" to compute various text statistics (e.g., number of words, shares of negative and positive words) on the tweet and post messages, resulting in the output file "text_stats.dta".

6) The script "analyze topics.R" uses "fb tw micro data.dta" to estimate topic models on the tweet and post messages, resulting in "ll_compare_fit.csv" (log likelihood values to compare different topic models), "top_terms18.csv" (the top terms associated with each topic for the model with k = 18 topics), and "topics_by_item18.csv" (tweet- and post-specific topic weights for the model with k = 18 topics).

7) The script "plot log likelihood.do" produces Figure A1.

8) The script "find Twitter-Facebook matches.R" uses "fb tw micro data.dta" to compute each tweet's maximum cosine similarity for possible twins on Facebook, with results stored in the folder "max cosine files".

9) The script "fb tw micro data w vars compile.do" combines all intermediate files described in Steps 1 to 8, resulting in the main analysis dataset "fb tw micro data w vars.dta" in the root directory.




